Document image analysis: What is missing?
Identifieur interne : 002B11 ( Main/Exploration ); précédent : 002B10; suivant : 002B12Document image analysis: What is missing?
Auteurs : George Nagy (informaticien) [États-Unis]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 1995.
Abstract
Abstract: The conversion of documents into electronic form has proved more difficult than anticipated. Document image analysis still accounts for only a small fraction of the rapidly-expanding document imaging market. Nevertheless, the optimism manifested over the last thirty years has not dissipated. Driven partly by document distribution on CD-ROM and via the World Wide Web, there is more interest in the preservation of layout and format attributes to increase legibility (sometimes called “page reconstruction”) rather than just text/non-text separation. The realization that accurate document image analysis requires fairly specific pre-stored information has resulted in the investigation of new data structures for knowledge bases and for the representation of the results of partial analysis. At the same time, the requirements of downstream software, such as word processing, information retrieval and computer-aided design applications, favor turning the results of the analysis and recognition into some standard format like SGML or DXF. There is increased emphasis on large-scale, automated comparative evaluation, using laboriously compiled test databases. The cost of generating these databases has stimulated new research on synthetic noise models. According to recent publications, the accurate conversion of business letters, technical reports, large typeset repositories like patents, postal addresses, specialized line drawings, and office forms containing a mix of handprinted, handwritten and printed material, is finally on the verge of success.
Url:
DOI: 10.1007/3-540-60298-4_317
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000018
- to stream Istex, to step Curation: 000018
- to stream Istex, to step Checkpoint: 001E68
- to stream Main, to step Merge: 002C68
- to stream Main, to step Curation: 002B11
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Document image analysis: What is missing?</title>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation><country>États-Unis</country>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:98ED88BBEA3D672C8B6B4CD53334028B97E12076</idno>
<date when="1995" year="1995">1995</date>
<idno type="doi">10.1007/3-540-60298-4_317</idno>
<idno type="url">https://api.istex.fr/document/98ED88BBEA3D672C8B6B4CD53334028B97E12076/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000018</idno>
<idno type="wicri:Area/Istex/Curation">000018</idno>
<idno type="wicri:Area/Istex/Checkpoint">001E68</idno>
<idno type="wicri:doubleKey">0302-9743:1995:Nagy G:document:image:analysis</idno>
<idno type="wicri:Area/Main/Merge">002C68</idno>
<idno type="wicri:Area/Main/Curation">002B11</idno>
<idno type="wicri:Area/Main/Exploration">002B11</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Document image analysis: What is missing?</title>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>ECSE, RPI, 12180-3590, Troy, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1995</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">98ED88BBEA3D672C8B6B4CD53334028B97E12076</idno>
<idno type="DOI">10.1007/3-540-60298-4_317</idno>
<idno type="ChapterID">89</idno>
<idno type="ChapterID">Chap89</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The conversion of documents into electronic form has proved more difficult than anticipated. Document image analysis still accounts for only a small fraction of the rapidly-expanding document imaging market. Nevertheless, the optimism manifested over the last thirty years has not dissipated. Driven partly by document distribution on CD-ROM and via the World Wide Web, there is more interest in the preservation of layout and format attributes to increase legibility (sometimes called “page reconstruction”) rather than just text/non-text separation. The realization that accurate document image analysis requires fairly specific pre-stored information has resulted in the investigation of new data structures for knowledge bases and for the representation of the results of partial analysis. At the same time, the requirements of downstream software, such as word processing, information retrieval and computer-aided design applications, favor turning the results of the analysis and recognition into some standard format like SGML or DXF. There is increased emphasis on large-scale, automated comparative evaluation, using laboriously compiled test databases. The cost of generating these databases has stimulated new research on synthetic noise models. According to recent publications, the accurate conversion of business letters, technical reports, large typeset repositories like patents, postal addresses, specialized line drawings, and office forms containing a mix of handprinted, handwritten and printed material, is finally on the verge of success.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Troy (New York</li>
</settlement>
<orgName><li>Institut polytechnique Rensselaer</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
</region>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002B11 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002B11 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:98ED88BBEA3D672C8B6B4CD53334028B97E12076 |texte= Document image analysis: What is missing? }}
This area was generated with Dilib version V0.6.32. |